Skip to content

Conversation

@cdoern
Copy link
Contributor

@cdoern cdoern commented Mar 20, 2024

This is the implementation of the MachineOSConfig and MachineOSBuild APIs in the MCO.

All usages of pool config and status in the build controller to store build information have been removed. Instead we now depend on a per-pool MachineOSConfig for user level OCB options and a per-build MachineOSBuild object created in its wake.

A sample MachineOSConfig looks as follows:

apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
   name: worker
spec:
  machineConfigPool:
    name: worker
  buildInputs:
    imageBuilder:
      imageBuilderType: PodImageBuilder
    baseImagePullSecret:
      name: global-pull-secret-copy
    renderedImagePushSecret:
      name: cdoern-ocb-push-secret
    renderedImagePushspec: quay.io/cdoern/origin-release:latest

This is the bare minimum spec needed to apply a MachineOSConfig object.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 20, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Mar 20, 2024

@cdoern: This pull request references MCO-1042 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

This is a draft to test the sanity of getting the MachineOSConfig and MachineOSBuild APIs working in MCO.

All usages of pool config and status in the build controller to store build information have been removed. Instead we now depend on a per-pool MachineOSConfig for user level OCB options and a per-build MachineOSBuild object created in its wake.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 20, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2024

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 20, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 20, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cdoern

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 20, 2024
}
}

// NewMachineConfigPoolCondition creates a new MachineConfigPool condition.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

praise: Nice additions!


// NewMachineConfigPoolCondition creates a new MachineConfigPool condition.
func NewMachineOSBuildCondition(condType string, status metav1.ConditionStatus, reason, message string) *metav1.Condition {
return &metav1.Condition{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Would it make sense to put these helpers in a separate file called, e.g., machineosbuildcondition.go? This would be purely for organizational purposes.

// equal to what the pool expects.
func (l *LayeredNodeState) IsUnavailable(mcp *mcfgv1.MachineConfigPool) bool {
return isNodeUnavailable(l.node) && l.isDesiredImageEqualToPool(mcp)
func (l *LayeredNodeState) IsUnavailable(mcp *mcfgv1.MachineConfigPool, layered bool) bool {
Copy link
Member

@cheesesashimi cheesesashimi Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is there an advantage to passing in whether a pool is layered or not instead of inferring it from the pool object?

I ask because when I wrote this, I intended for LayeredNodeState to infer whether the pool is layered or not. To be clear: I'm fine with this change, I just want to understand the reason for it 😄

EDIT: I think I understand the reason for this change, please correct me if I'm wrong: We're now inferring whether a pool is layered or not based upon both the presence of the layered label, as well as the presence of both an associated MachineOSBuild and MachineOSConfig object, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should only depend on the existence of the MachineOSConfig object as the opt in mechanism to layering.

I kept the label thing here just for testing

val, ok := l.node.Annotations[anno]

if lps.IsLayered() && lps.HasOSImage() {
if lps.HasOSImage() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: On the surface, it makes sense that if a pool has an OS image that it is layered. But there are two corner-cases that checking for both the presence of the layered label (IsLayered()) and the OS image annotation presence (HasOSImage()) catches:

  1. When a pool is opted into layering, it won't immediately have an OS image because of the time it takes BuildController to build the image and apply it to the pool. In this situation, the node image annotation will not be equal to the pool, so this check should return false.
  2. When a pool is opted out of layering, the OS image will be removed from the pool. In this situation, the node will have the image annotation when the pool does not have an OS image, so this check should return false.

// OS image value, if the value is available. If the pool is not layered, then
// any image annotations should not be present on the node.
func (l *LayeredNodeState) isImageAnnotationEqualToPool(anno string, mcp *mcfgv1.MachineConfigPool) bool {
func (l *LayeredNodeState) isImageAnnotationEqualToPool(anno string, mcp *mcfgv1.MachineConfigPool, layered bool) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quibble: The layering parameter is unused in this function.

}

for _, config := range configList {
if config.Spec.MachineConfigPool.Name == pool.Name {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I wish this was more ergonomic. I know that listers support label selectors, but then we'd need something to set a label on each MachineOSConfig and MachineOSBuild object. It'd be nice if listers supported field selectors too.

(To be clear: I'm not asking you to add labels.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, I think for GA we should support matching labels on these new objects

for _, config := range configList {
if config.Spec.MachineConfigPool.Name == pool.Name {
ourConfig = config
break
Copy link
Member

@cheesesashimi cheesesashimi Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: What should happen if ourConfig / ourBuild is nil, meaning that there is no config or build for the given MachineConfigPool? Should we return an error in that case?

EDIT: Oh, I see that you've added the nil check to IsLayeredPool().

// ready so we can update both the nodes' desired MachineConfig and desired
// image annotations simultaneously.

func (ctrl *Controller) GetConfigAndBuild(pool *mcfgv1.MachineConfigPool) (*mcfgv1alpha1.MachineOSConfig, *mcfgv1alpha1.MachineOSBuild, error) {
Copy link
Member

@cheesesashimi cheesesashimi Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quibble: If this method is not intended to be used outside of the NodeController, it should be renamed to getConfigAndBuild() because of Go's export rules.


// If the MachineConfigPool has the build object reference, we just want to
// update the MachineConfigPool's status.
if ps.HasBuildObjectRef(objRef) {
Copy link
Member

@cheesesashimi cheesesashimi Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: This never worked quite the way that I wanted it to only because RenderController would wipe out the object reference to the running build. In the long-run, I'd like to move away from this as the way that we're associating a running build with a MachineConfigPool. Instead, I think adding labels to the MachineConfigPool and the ephemeral build objects that we can use label queries for might be the better option.

(To be clear: I'm not asking you to change anything; I'm commenting on something that I think may have been a mistake on my part.)

ibr, err := ctrl.prepareForBuild(inputs)
var ourConfig *mcfgv1alpha1.MachineOSConfig
for _, c := range machineOSConfigs {
if c.Spec.MachineConfigPool.Name == mosb.Spec.MachineConfigPool.Name {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Since it looks like we're repeating this pattern here and in NodeController, lets add a helper in the apihelpers package that does something like:

func GetMachineOSConfigForPool(moscLister MoscListerInterface, pool *mcfgv1.MachineConfigPool) (*mcfgv1alpha1.MachineOSConfig, error) {
	machineOSConfigs, err := ctrl.machineOSConfigLister.List(labels.Everything())
	if err != nil {
		return nil, err
	}

	for _, c := range machineOSConfigs {
		if c.Spec.MachineConfigPool.Name == mosb.Spec.MachineConfigPool.Name {
			return c, nil
		}
	}

	// https://github.com/kubernetes/apimachinery/blob/master/pkg/api/errors/errors.go#L144
	return nil, apierrors.NewNotFound(...)
}

Because we return an error if we can't find the object, the caller can check for that error and determine what to do next. We should also add an equivalent helper for MachineOSBuilds as well. Although for that particular case, we may want to return a list instead.

}

ps := newPoolState(pool)
var mosb *mcfgv1alpha1.MachineOSBuild
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Let's hoist this into the apihelpers package. See: https://github.com/openshift/machine-config-operator/pull/4271/files#r1532244342.

Also, we may eventually have multiple builds per pool, depending on how we decide to implement multi-arch builds.

Output: buildv1.BuildOutput{
To: &corev1.ObjectReference{
Name: i.FinalImage.Pullspec,
Name: i.MachineOSConfig.Spec.BuildInputs.FinalImagePullspec,
Copy link
Member

@cheesesashimi cheesesashimi Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: It is not obvious (and I apologize for that) that this value is mutated from what is retrieved from the on-cluster-build-config ConfigMap. This occurs in the getOnClusterBuildConfig() method. Specifically, the name of the rendered MachineConfig replaces the user-provided tag. The user-provided tag is ignored, although maybe we should update the user-provided tag instead.

(To be clear: I'm just bringing this to your attention.)

VolumeSource: corev1.VolumeSource{
Secret: &corev1.SecretVolumeSource{
SecretName: i.FinalImage.PullSecret.Name,
SecretName: i.MachineOSConfig.Spec.BuildInputs.FinalImagePullspec,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Did you mean i.MachineOSConfig.Spec.BuildInputs.FinalImagePushSecret.Name here?

ps.DeleteBuildRefForCurrentMachineConfig()
// Set the annotation or field to point to the newly-built container image.
klog.V(4).Infof("Setting new image pullspec for %s to %s", ps.Name(), imagePullspec)
ps.SetImagePullspec(imagePullspec)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: This value should be provided by the FinalPullspec() method on both the ImageBuilder and CustomPodBuilder objects. The reason is because both of those objects return the digested image pullspec instead of the tagged image pullspec.

@cdoern cdoern force-pushed the ocb-api branch 6 times, most recently from e7c8704 to fabc9b2 Compare March 25, 2024 13:16
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 25, 2024
@cdoern cdoern force-pushed the ocb-api branch 2 times, most recently from a6eb27c to 6e8eca8 Compare March 26, 2024 00:30
@cdoern cdoern marked this pull request as ready for review April 12, 2024 13:03
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 12, 2024
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 12, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Apr 12, 2024

@cdoern: This pull request references MCO-1042 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

This is the implementation of the MachineOSConfig and MachineOSBuild APIs in the MCO.

All usages of pool config and status in the build controller to store build information have been removed. Instead we now depend on a per-pool MachineOSConfig for user level OCB options and a per-build MachineOSBuild object created in its wake.

A sample MachineOSConfig looks as follows:

apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
  name: worker
spec:
 machineConfigPool:
   name: worker
 buildInputs:
   imageBuilder:
     imageBuilderType: PodImageBuilder
   baseImagePullSecret:
     name: global-pull-secret-copy
   renderedImagePushSecret:
     name: cdoern-ocb-push-secret
   renderedImagePushspec: quay.io/cdoern/origin-release:latest

This is the bare minimum spec needed to apply a MachineOSConfig object.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

} {
}

// this is for the FG
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might need this @inesqyx otherwise clusters may fail trying to list this type? I am unsure

You'll need to make sure the MOSC and MOSB CRDs are installed correctly upon cluster installation or else this'll cause issues.

Signed-off-by: Charlie Doern <[email protected]>
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 15, 2024
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Apr 15, 2024

@cdoern: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/images e728f9d link true /test images
ci/prow/bootstrap-unit e728f9d link false /test bootstrap-unit
ci/prow/e2e-gcp-op e728f9d link true /test e2e-gcp-op
ci/prow/e2e-aws-ovn e728f9d link true /test e2e-aws-ovn
ci/prow/e2e-aws-ovn-upgrade e728f9d link true /test e2e-aws-ovn-upgrade
ci/prow/e2e-gcp-op-single-node e728f9d link true /test e2e-gcp-op-single-node
ci/prow/e2e-hypershift e728f9d link true /test e2e-hypershift
ci/prow/okd-scos-e2e-aws-ovn e728f9d link false /test okd-scos-e2e-aws-ovn
ci/prow/e2e-aws-ovn-upgrade-out-of-change e728f9d link false /test e2e-aws-ovn-upgrade-out-of-change
ci/prow/e2e-openstack e728f9d link false /test e2e-openstack
ci/prow/e2e-azure-ovn-upgrade-out-of-change e728f9d link false /test e2e-azure-ovn-upgrade-out-of-change
ci/prow/security e728f9d link false /test security
ci/prow/e2e-gcp-op-techpreview e728f9d link false /test e2e-gcp-op-techpreview
ci/prow/okd-scos-images e728f9d link true /test okd-scos-images
ci/prow/okd-images e728f9d link false /test okd-images
ci/prow/unit e728f9d link true /test unit
ci/prow/verify e728f9d link true /test verify

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@rioliu-rh
Copy link

/hold for QE testing

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2024
@sergiordlr
Copy link
Contributor

Pre-merge testing has been tracked here: https://issues.redhat.com/browse/MCO-1149

/label qe-approved

@openshift-ci openshift-ci bot added the qe-approved Signifies that QE has signed off on this PR label May 7, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 7, 2024

@cdoern: This pull request references MCO-1042 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

This is the implementation of the MachineOSConfig and MachineOSBuild APIs in the MCO.

All usages of pool config and status in the build controller to store build information have been removed. Instead we now depend on a per-pool MachineOSConfig for user level OCB options and a per-build MachineOSBuild object created in its wake.

A sample MachineOSConfig looks as follows:

apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
  name: worker
spec:
 machineConfigPool:
   name: worker
 buildInputs:
   imageBuilder:
     imageBuilderType: PodImageBuilder
   baseImagePullSecret:
     name: global-pull-secret-copy
   renderedImagePushSecret:
     name: cdoern-ocb-push-secret
   renderedImagePushspec: quay.io/cdoern/origin-release:latest

This is the bare minimum spec needed to apply a MachineOSConfig object.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sergiordlr
Copy link
Contributor

/remove-label qe-approved

@openshift-ci openshift-ci bot removed the qe-approved Signifies that QE has signed off on this PR label May 8, 2024
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented May 8, 2024

@cdoern: This pull request references MCO-1042 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.16.0" version, but no target version was set.

Details

In response to this:

This is the implementation of the MachineOSConfig and MachineOSBuild APIs in the MCO.

All usages of pool config and status in the build controller to store build information have been removed. Instead we now depend on a per-pool MachineOSConfig for user level OCB options and a per-build MachineOSBuild object created in its wake.

A sample MachineOSConfig looks as follows:

apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
  name: worker
spec:
 machineConfigPool:
   name: worker
 buildInputs:
   imageBuilder:
     imageBuilderType: PodImageBuilder
   baseImagePullSecret:
     name: global-pull-secret-copy
   renderedImagePushSecret:
     name: cdoern-ocb-push-secret
   renderedImagePushspec: quay.io/cdoern/origin-release:latest

This is the bare minimum spec needed to apply a MachineOSConfig object.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-bot
Copy link
Contributor

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 7, 2024
@openshift-bot
Copy link
Contributor

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Sep 6, 2024
@openshift-bot
Copy link
Contributor

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

@openshift-ci openshift-ci bot closed this Oct 7, 2024
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 7, 2024

@openshift-bot: Closed this PR.

Details

In response to this:

Rotten issues close after 30d of inactivity.

Reopen the issue by commenting /reopen.
Mark the issue as fresh by commenting /remove-lifecycle rotten.
Exclude this issue from closing again by commenting /lifecycle frozen.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants